README
Jetson Leder-Luis
April, 2023
"Can Whistleblowers Root Out Public Expenditure Fraud? Evidence from Medicare"
Review of Economics and Statistics


These files takes results from a scraper of the DOJ website
split into main and archive (which is the way the site presents)
and generates a universe of false claims act medicare Press Releases

--------------------------------------------------

DOJScraper.py

This file scrapes the main PR website and produces PRList.csv

Step 1 to run: Go to the Justice News website and find the number of pages, to adjust the n-page variable

We match the archive on the words "False Claims Act" in the URL.  Note that based on the search it might match any articles that contain all 3 words not the full string but most are okay and we filter later.

--------------------------------------------------

DOJArchiveScraper.py 

This file scrapes the archive and produces ArchivePRList.csv

It goes by month and by year


--------------------------------------------------


DOJUniverse.R further cleans and combines them into PRUniverse.csv

These were then hand-cleaned into PRCoded.csv. See: CodingNotes